-
Notifications
You must be signed in to change notification settings - Fork 103
Fix MalformedInputException on Windows with Java 8 due to charset mis… #1274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…match in stale data cache Changed StaleHelper to always save in UTF-8 instead of platform-dependent default charset. This ensures consistency with AbstractJavadocMojo.isUpToDate() which reads the stale data file using UTF-8. Previously on Windows with Java 8: - First run: file written with Cp1252 (default charset) - Second run: file read with UTF-8, causing MalformedInputException
a49eba0
to
42fe299
Compare
This basically reverting 33c9f01 which was actually fixing a real problem of inconsistency |
You mean this is reverting https://issues.apache.org/jira/browse/MJAVADOC-614 ? |
The d2dd532 brought an inconsistency that I saw with a project containing chinese characters and I made it consistent with 33c9f01 I assume that this bug comes from something related to the way we are determining the charset in that getDataCharset function. It might be needed to make the decision making depending not only on java version but also on os/arch. I will test this fix. If it is not making the previous bug come back, we can just leave it as it is, but if it regresses, we should look for a proper fix. |
@slachiewicz have a look at https://issues.apache.org/jira/browse/MJAVADOC-614, it mentions the intent for the JDK check, and the fact that the @-files is using UTF-8 on JDK 9..12 and |
Actually, as I look at it more and more and as I understand it better, I think this particular fix is the right one. |
But it won't work well on JDK 9, 10 and 11 as I raised above. |
I mean, I run the ITs with openjdk 8 11 and 17 and they passed. My only concern is that the mismatch is in what one gets from plexus-utils as path in string and its encoding and how one reads the line. The thing is that - as I understand it now - it boils down to the filesystem encoding itself for java 8. Because the |
On Windows, where the default encoding is not UTF-8 ? I think the problem is that javadoc does expect a certain encoding, so we have no choice but to actually use that one. If we always write in UTF-8, I don't see how that would work. |
Indeed, no |
So, maybe it would be better to cross-check where the original problem lies then. I was having a thought, that if while reading in the try catch block we actually set the charset we used for reading and we write using that one, it could eventually work. But then, I am not sure, because the exception does not necessarily need to be thrown if the path is a valid utf-8, but it is actually in other encoding. |
Or, we extract the getDataCharset() somewhere in SystemUtils or so, make it public and use it everywhere where we need to use the charset |
Let me craft something. |
…rset mis… (apache#1274)" This reverts commit b453602.
Something like #1278 |
…match in stale data cache
Problem
Starting in version 3.11.3, running
mvn javadoc:javadoc
twice in succession on Windows with Java 8 fails on the second run with:This regression affects Windows users running Java 8, where the default platform encoding is
Cp1252
.Root Cause
The issue stems from a charset mismatch in how the stale data cache file is written versus how it's read:
Write operation (
StaleHelper.writeStaleData()
, line 126):getDataCharset()
which returnsCharset.defaultCharset()
for Java 8Cp1252
Read operation (
AbstractJavadocMojo.isUpToDate()
, line 5008):StandardCharsets.UTF_8
When the second run attempts to read a file written with
Cp1252
encoding usingUTF-8
, any non-ASCII bytes cause aMalformedInputException
.Solution
Changed
StaleHelper
to always save inStandardCharsets.UTF_8
instead of platform-dependent charset. This ensures:Fixes: #1273 #1264